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Abstract 

Bivariate time series which display nonstationaLiy be- 
havior, such as cycles or long term trends, are co mm on 
in fields such as oceamography and meteorology. These 
are usually very large scale data sets ajid often may 
contain long gaps of missing values in one or both se- 
ries, with the gaps perhaps occurring at different time 
periods in the two series. We present a simplified but 
effective method of interactively examining and filling 
in the missing values in such series using extensions of 
the methods available in AGSS, an APL2-based statis- 
tical softw’aire package. Our method allows for possible 
detrending and removal of seasonal components before 
automatically estimating airbitrary patterns of missing 
values for each series. Interactive bivariate spectral 
analysis can then be performed on the detrended and 
deseasonalized interpolated data if desired. We illus- 
trate our results using a bivariate time series of ocean 
current velocities measured off the California coast. 

1 Introduction 

Gaps of missing values of various sizes are a common 
problem in many data sets. In oceanographic data, 
for example, a single large gap may arise in the gath- 
ering of tidai data when an instrument stops working 
and the malfunction is not detected for several days. 
Many small gaps are more characteristic of data gath- 
ered from satellites. The missing value problem is com- 
plicated for bivariate series in that the gaps may not 
fail at the same time periods in both series. Ad hoc 
univariate methods, such as basing ‘^suitable” replace- 
ment values on the range of values assumed by neigh- 
boring points or points of the same periodicity, fail 
to account for possible cross correlation in the data. 
In order to successfully analyze the spectrum of gap- 
py data sets, or use the data for other purposes, the 
missing values need to be estimated in a way that Is 
characteristic of the rest of the bivariate data set. 



The problem of missing values in time series has 
been studied by several authors in recent years, pri- 
marily in a state space framework. Jones (1980) 
used a KaJmain filter recursion to calculate the ex- 
aict likelihood of a univariate stationary autoregressive 
moving average (ARMA) process with missing values, 
while Harvey and Pierce (1984) and Kohn and Ans- 
ley (1986) extended the Kalman filtering method to 
nonstationary autoregressive integrated moving aver- 
age (AJir\L\) processes. Ansley and Kohn (1985) gave 
a method of recursively calculating the likelihood for a 
multivariate state space model with incompletely spec- 
ified initial conditions which can be used to interpolate 
an arbitrary pattern of missing values in multivariate 
time series. Both the computation and derivation are 
much simpler in the univariate case than in the mul- 
tivariate case. More recently, Ljung (1989) derived an 
exact expression for the estimates of missing values in 
a univariate ARTMA process in a form that is useful 
for examining the estimates and their mean squared er- 
rors. For an arbitrary pattern of missing values, how- 
ever, the computations are not very efficient. None 
of the above methods is thus easy to implement in 
practice for bivariate series which are possibly nonsta- 
tionary and have arbitrary patterns of missing values 
in both series. 

In this paper, we present an algorithm for semi- 
automatically filling in gaps in bivariate time series, 
allowing for trends, cycles, and cross correlation in the 
data. The interactive implementation of the adgorith- 
m aiUows for visual exaimination of the data at each 
step, giving the practitioner the opportunity to view 
the original data with missing values, the "patched” 
data in which the missing values have been filled in 
using linear interpolation, and the estimated autospec- 
trum of the crudely interpolated data, .After this ex- 
aunination, one may choose to remove a trend or cycles 
from the data. The remaining series is automatically 
modeled as an autoregressive process and the estimat- 
ed model is used for interpolation. The method allows 
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for joint interpolation of two correlated series, incor- 
porating an estimate of the autocorrelation for each 
series, and the cross correlation between the two se- 
ries, into the interpolated values. At the end of the 
interpolation phase, the user has the choice of exam- 
ining the coherence function of the interpolated series, 
as well as producing more detailed plots of paxticulax 
segments of the series. The following section presents 
the interpolation algorithm in detail, while Section 3 
gives an application of the aJgorithm to a bivariate se- 
ries of ocean current velocity meter reaxiings measured 
off the California coast. 

2 Algorithm 

The following algorithm has been coded in APL2 
using the IBM APL2 AGSS program as a comput- 
ing platform. Thus functions such as regression, the 
Fast Fourier Transform used for computing the peri- 
odogram of the series, and random number generation 
from AGSS are used, as well as some of the AGSS 
graphics screens. The aJgorithm is available in a AGSS 
libraxy from the authors for mainframe or microcom- 
puter data. The algorithm is as follows: 

1. The user is asked to enter the name of the original 
series containing gaps and the series is plotted. 
Denote this series by r(t), t = 1, 2, . . . , n. If there 
are two series containing gaps, and the two series 
are cross correlated in some way, the user is asked 
to enter the name of the second series and the 
second series is plotted as well. Denote this series 
t>y y(t)j t = 1 , 2 , . . . , n. The two series must have 
the same length, but the location and length of 
gap>s in the series may differ. 

2. The user is asked to enter the number used on the 
data record to indicate a missing value. 

3. The program then computes the locations of the 
gaps and fills in the missing values for each series 
by linearly interpolating between the two points 
on either side of the gap. Denote the linearly in- 
terpolated series as Xi(t) and y\(t) respectively. 
The resulting series are then plotted and can be 
visually examined by the user to decide whether 
removal of a linear trend is necessary. 

Note: Steps 4-8 are applied to both the Xi(t) and 
7/1 (t) series in exactly the same manner. Only the 
results for the x\{t) series are given below. 

4. If 50 desired, the program removes a linear trend 
from each series. The trend is estimated using 



least squares applied to the initial ‘patched” se- 
ries. The resulting series is 

X2(t) = Xi(t) — d — St, 

where a is the estimated constant and S is the 
estimated slope. 

5. The sample periodogram is automatically esti- 
mated and plotted for the interpolated and de- 
trended data, X 2 (t). (see, for example. Priestly 
(1981) for a definition of the periodogram and its 
interpretation) . 

6. The program caJculates the probability of obtain- 
ing the computed values for the 20 largest val- 
ues of the normalized periodogram under the as- 
sumption that xz{t) is a Gaussian white noise pro- 
cess. A small probability indicates that a cycle 
may be present in the data. The probabilities 
are computed using the large sample test statis- 
tic for Ijyj = 1,2, [n/2], where Ij denotes the 

largest ordinate of the periodogram. (Priestly 
1981, p.407). 

7. Using the information obtained in the previous 
step, as well as any intuitive or physical knowledge 
of cyclic behavior in the series, the user specifies 
cycles to be estimated and removed from the in- 
terpolated and detrended data, if desired. The 
cycles are assumed to be of the form 

J 

St = y^{7jC03(w;t) +/3jSin(w;t)}, 
j=i 

where ujj = 27t/j are the frequencies that you 
would like to remove and J is the number of cy- 
cles. The and Pj are estimated using least 
squares. The resulting series is Xs(t) = ^^(t) — if. 

3. A first order autoregressive (AR) model is fitted 
to the detrended and deseason alized data Xs(t). 
An .AR(1) model has the form 

X 3 (t) = d)xz(t - 1) 4- Ox(t), t = 2, 3, . . . , n. 

We assxime that a^(t) The pa- 

rameter is estimated using least squares. The 
residual series dx(t) is 

Ox(t) = xa(0 - (pxzit - 1), t = 2, 3, . . . , n. 

9. The variance of {o^(t)} is calculated and a white 
noise series of length n having the distribution 
iV(0,(73^) is generated. Denote this series by 

<( 0 . 
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10. Lec I denoCe the length of a particular gap in the 
series and lec xz(t) and zz(t 4- / -r 1) denote the 
points on either side of the gap. The program 
forecasts and backcasts from each end of the gap 
using the following recursive equations for j = 
1 , 2 ,...,/. 

xz{t 4* j) = (pxz{t -r j — 1) 4- 4" j) 

X3(t -4 / 4- 1 - j) = 0x3(t 4- / 4- 2 - j) 

4-a^(t 4- / 4- 1 -i). 

Then the incerpolated value is 

4- j) = wijxz(t 4- j) -4 W2jxz(t 4- / 4* 1 - 
where w^j = 1 — (j/l 4- 1) and = 1 — w\j. 

11. If interpo lacing values for two correlated series, 

the standaird deviation of the residual series 
(dx(t)} and found in Step 8 is calculat- 

ed and the sample cross correlation at lag 0 be- 
tween (dx(t)} aJid {dy(t)} is computed. Denote 
this by c. A white noise series of length n having 
the distribution iV(0, is generated. Denote 
this series by A second white noise series of 

length n is generated using the following relation: 

= c(0-a.y/<5-o^)4(t) -f \/l -C^a[{t). 

12. The values for the yz{t) series axe interpolated as 
in Step 9, with a^^{t 4- j) replaced by a'y(t 4- j) 

13. The estimated trend and cycle are added back to 
the interpolated series and the series containing 
the rinal estimates of the missing values is plotted. 

14. The user may choose to plot the coherence func- 
tion for the detrended and deseasonaiized incer- 
polated series if desired. 

15. The user may also choose to plot more detailed 
segments of the final interpolated series if desired. 

Using a weighted average of backward and forward 
forecasts made horn each end of a gap using an es- 
timated univariate .\RMA model, as was used in Step 
10, was discussed by Abraham (1981). He calcxilates 
the weights to minimize che mean-squared error of the 
interpolated value, thus the weights depend in a com- 
plicated way on both the estimated model parameters 
and the length of gap. Our method, adthough sim- 
ple, is intuitively appealing in that it gives less weight 
to interpolated values at long lead times and is easy 
to implement when the lengths of the gaps are dif- 
ferent. Note that in Step 10, we include a simulated 



noise term in the usual expressions for the backward 
and forw'ard forecasts of an AR(1) model. This is to 
eliminate unrealistic “smoothness” in the interpolated 
values, which will occur if the gap is very long. Real- 
istic noise in the series is important if che interpolated 
series will be used to estimate the spectral density of 
the series. Similarly, in Step 11 we incorporate 2 in esti- 
mate of the contemporaneous correlation between the 
two series into the estimates of the incerpolated values 
of the second series by including a noise term taiken 
from a simulated bivariate Normal pair of series with 
correlation a The method for generating a bbrniate 
Normal pair with specified correlation is taken from 
Lewis and Orav (1989, p.301). A discussion of con- 
temporaneous bivariate time series models made be 
found in Camacho, Hipel, and McLeod (1987). 

3 An Example 

We apply the above algorithm to a vector pair of 
ocean current velocities collected off Point Sur, Cal- 
ifornia over the period 0000 hours, Sept. 19, 1990 
to 2300 hours, Oct. 30, 1990, a total period of 1008 
hours. Current velocities are just one set of variables 
which are collected regularly by oceaLnographers at the 
Naval Postgraduate School in order to provide infor- 
mation related to the long term variability in sea sur- 
face temperatures off the California coast. The veloci- 
ties were measured using a paddlewheel and electronic 
counter assembly located at the top of the recording 
unit placed at a depth of 350 meters. Velocity, in units 
of cm/s, was determined from the number of revolu- 
tions made by the paddlewheel during each sampling 
interval, to an accuracy of ±1.0 cm/s. The data was 
initially recorded at 30 minute intervals. After initial 
visual inspection for outliers or periods suspect of in- 
strument failures, and mainual editing if necessary/, the 
data was ffltered using a Cosine-Lanczos filter with 
a centered 25 point data window and interpolated to 
specified 60 minute inter\'als. .\t this point, there re- 
mained a period of 63 consecutive hours of missing da- 
ta, in which the data gathering instruments were not 
working. There were also a few scattered individual 
missing values. 

Figures 1 and 2 show plots of the E-W (or u) com- 
ponent of the current velocity, and the N-S (or v) com- 
ponent of the current, with missing values coded as 0. 
Figures 3 and 4 depict the same series with missing 
values “patched” using linear interpolation. There a^>- 
pear to be regular cycles in the datai, as expected for 
current data, as well as the presence of a long term 
trend. We fit a linear trend to both the u and v cur- 
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renc components, with estimated constants of 3.91 and 
1.60, respectively, and estimated slope coeflBcients of 
-0.0018 and -0.0003. Standard t-tests on the signifi- 
cance of the regression coefficients are not appropriate 
because the residuals are not assumed to be uncor- 
related. Figures 5 and 6 show the periodogram for 
each detrended series. Diurnal and semi-diumal cy- 
cles are clearly indicated for the u-component of cur- 
rent velocity, with only the diumaJ cycle clearly seen 
in the u-component. In addition, there app>ear 3 to be 
some long range dependence in the data, as evidenced 
by the laxge values of the periodogram at small fre- 
quencies, which remain even after the removal of a 
long term trend. See Lewis and Ray (1992) for a dis- 
cussion of long range dependence in sea surface data. 
Based on the approximate p- values of the test statistic 
for each of the 20 largest periodogram ordinates and 
knowledge of the tidal cycles, we remove cycles at fre- 
quencies (in cycles per 1008 hours) 81, 82, and 84 in 
the u-component and frequencies 42, 81, 82, and 84 
in the u-component. Table 1 gives the values of the 
normalized periodogram values at these ordinates and 
the resulting p-values for the test statistic. After this 
step, the missing values are automaticaJly estimated 
for the detrended and deseasonadized data. Figures 7 
and 8 show the two series with final estimates of the 
missing values, after the trend and cycles have been 
added back to the series. The estimated values appear 
to follow the pattern of the data quite nicely. 

We study the correlation between the two series 
in more detail by looking at the coherence function 
for the two interpolated, detrended and deseasonalized 
series. The coherence is initially computed ass umin g 
a cosine arch window of length 7. The user has the 
option of changing the window at any point in the 
coherence analysis. Figure 9 shows the cross-phase, 
cross-amplitude, and cross-coherence functions of the 
two series. The cross- amplitude spectrum shows de- 
pendence between the series at fairly low frequencies, 
but the (normalized) coherence measiu*e shows that 
the dependence extends to high frequencies as well. 
No systematic effect can be seen in the phase spec- 
trum. 



.Additionally, the enlarged segment in Figure 10 
depicts the two series with values plotted as vertical 
lines drawn from the x-axis. The segment partially 
includes the interpolated values shown in Figures 7 
and 3. No apparent difference between the known and 
the interpolated values can be seen. 



4 Summciry 

We have presented a simple algorithm which permits 
interpolation of arbitrary patterns of missing vaJues in 
both univariate and bivariate time series, allowing for 
the possibility of non-stationarity. The implementa- 
tion is interactive and has graphical capabilities avail- 
able at each step. It is aJso much easier to implement 
in practice than the state-space approach of Ansley 
and Kohn(1985), which requires modified versions of 
the Kalman filter and the fixed point smoothing al- 
gorithm. The estimated contemporaneous correlation 
between the two series is used in the interpolation al- 
gorithm in order to estimate the missing values in a 
manner that is consistent with the rest of the data. 
Although we have assumed that each series follows a 
simple AR(1) model, the algorithm could easily be ex- 
tended to model each series as an ARiVlA(p, q) model, 
with the orders of p and q chosen by the user sifter 
exaimination of the sample autocorrelation and pair- 
tial autocorrelation functions of the detrended and de- 
seasonaJized “patched” data. Functions necessary to 
compute the sample correlation functions and estimate 
the parameters of an ARMA(p, g) model are already 
present in the IBM APL2 AGSS package. (The AGSS 
application disdussed here is available from the au- 
thors; the AGSS package is available from EBM.) 
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Tables and Figures 



Table 1: Normalized Periodogram Values for Ocean 
Current Velocities 



U-component of current velocity 


Frequency 


Norm. Periodogram Value 


! p-value 


1 


106,14 


0.00 


81 


88.52 


0.00 


2 


35.67 


0.00 


82 


22.42 


0.00 


84 


20.85 


0.00 


V-componenc of current velocity 


Frequency 


1 Norm. Periodogram Value I 


1 pHvaiue 


31 


195.51 


0.00 


1 


69.88 


0.00 


84 


34.81 


0.00 


2 


20.90 


0.00 


32 


14.40 


0.00 


42 


10.09 


0.11 
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Figure 1: U-component of Current Velocity (missing 
values coded as O^s) 
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Figure 2: V*-component of Current Velocity (missing 
values coded as O's) 
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Figure 3: U-component of Current Velocity (missing Figure 5: Periodogram of U-component of Cinrenc Ve- 
values linearly interpolated) locity aifter linear detrending 
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Figure 4: V-component of Current Velocity (missing 
vaiues linearly interpolated) 



Figure 6: Periodogram of V-component of Current Ve- 
locity after linear detrending 
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Figure 7: U-coraponent of Current Velocity (missing 
values estimated using complete interpolation proce- 
dure) 



Figure 9: Cross- amplitude Spectrum (top), Phase 

Spectrum (middle), and Coherence (bottom) of 
U-component and V-component of Current Velocity 
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Figure 3: V-component of Current Velocity (missing 
values estimated using the complete interpolation pro- 
cedure) 



Figure 10: Detailed segment of U-component and 

V-component of Current Velocity 2 Lfter 6nai estima- 
tion of missing values 
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